9 research outputs found

    Random forests with random projections of the output space for high dimensional multi-label classification

    Full text link
    We adapt the idea of random projections applied to the output space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage

    PMG: Multi-core metabolite identification

    Get PDF
    Distributed computing has been considered for decades as a promising way of speeding up software execution, resulting in a valuable collection of safe and efficient concurrent algorithms. With the pervasion of multi-core processors, parallelization has moved to the center of attention with new challenges, especially regarding scalability to tens or even hundreds of parallel cores. In this paper, we present a scalable multi-core tool for the metabolomics community. This tool addresses the problem of metabolite identification which is currently a bottleneck in metabolomics pipeline.Analytical BioScience

    Novel techniques for automorphism group computation

    Get PDF
    Graph automorphism (GA) is a classical problem, in which the objective is to compute the automorphism group of an input graph. In this work we propose four novel techniques to speed up algorithms that solve the GA problem by exploring a search tree. They increase the performance of the algorithm by allowing to reduce the depth of the search tree, and by effectively pruning it. We formally prove that a GA algorithm that uses these techniques correctly computes the automorphism group of the input graph. We also describe how the techniques have been incorporated into the GA algorithm conauto, as conauto-2.03, with at most an additive polynomial increase in its asymptotic time complexity. We have experimentally evaluated the impact of each of the above techniques with several graph families. We have observed that each of the techniques by itself significantly reduces the number of processed nodes of the search tree in some subset of graphs, which justifies the use of each of them. Then, when they are applied together, their effect is combined, leading to reductions in the number of processed nodes in most graphs. This is also reflected in a reduction of the running time, which is substantial in some graph families

    Visual Network Analysis of Dynamic Metabolic Pathways

    Get PDF
    Abstract. We extend our previous work on the exploration of static metabolic networks to evolving, and therefore dynamic, pathways. We apply our visualization software to data from a simulation of early metabolism. Thereby, we show that our technique allows us to test and argue for or against different scenarios for the evolution of metabolic pathways. This supports a profound and efficient analysis of the structure and properties of the generated metabolic networks and its underlying components, while giving the user a vivid impression of the dynamics of the system. The analysis process is inspired by Ben Shneiderman’s mantra of information visualization. For the overview, user-defined diagrams give insight into topological changes of the graph as well as changes in the attribute set associated with the participating enzymes, substances and reactions. This way, “interesting features” in time as well as in space can be recognized. A linked view implementation enables the navigation into more detailed layers of perspective for in-depth analysis of individual network configuration

    Application of Conformal Prediction in QSAR

    No full text
    Part 4: First Conformal Prediction and Its Applications Workshop (COPA 2012)International audienceQSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using statistical learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity. However, predictions from a QSAR model are difficult to assess if their prediction intervals are unknown. In this paper we introduce conformal prediction into the QSAR field to address this issue. We apply support vector machine regression in combination with two nonconformity measures to five datasets of different sizes to demonstrate the usefulness of conformal prediction in QSAR modeling. One of the nonconformity measures provides prediction intervals with almost the same width as the size of the QSAR models’ prediction errors, showing that the prediction intervals obtained by conformal prediction are efficient and useful

    Thermodynamic Properties Of Asphaltenes: A Predictive Approach Based On Computer Assisted Structure Elucidation And Atomistic Simulations

    No full text
    INTRODUCTION Crude oil is a complex mixture of hydrocarbons and heteroatomic organic compounds of varying molecular weight and polarity [1]. A common practice in the petroleum industry is to separate crude oil into four chemically distinct fractions: saturates, aromatics, asphaltenes and resins [1--4]. Asphaltenes are operationally defined as the non-volatile and polar fraction of petroleum that is insoluble in n-alkanes (i.e., pentane). Conversely, resins are defined as the non-volatile and polar fraction of crude oil that is soluble in n-alkanes (i.e., pentane) and aromatic solvents (i.e., toluene) and insoluble in ethyl acetate. A commonly accepted view in petroleum chemistry is that asphaltenes form micelles which are stabilized by adsorbed resins kept in solution by aromatics [5,6]. Two key parameters that control the stability of asphaltene micelles in a crude oil are the ratio of aromatics to saturates and that of resins to asphaltenes.

    Virtual porous carbons: what they are and what they can be used for

    No full text
    We use the term “virtual porous carbon” (VPC) to describe computer-based molecular models of nanoporous carbons that go beyond the ubiquitous slit pore model and seek to engage with the geometric, topological and chemical heterogeneity that characterises almost every form of nanoporous carbon. A small number of these models have been developed and used since the early 1990s. These models and their use are reviewed. Included are three more detailed examples of the use of our VPC model. The first is concerned with the study of solid-like adsorbate in nanoporous carbons, the second with the absolute assessment of multi-isotherm based methods for determining the fractal dimension, and the final one is concerned with the fundamental study of diffusion in nanoporous carbons.M. J. Biggs and A. But
    corecore